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Abstract 


We present the first almost-linear time algorithm for constructing linear-sized spectral spar¬ 
sification for graphs. This improves all previous constructions of linear-sized spectral sparsifica¬ 
tion, which requires time [BSS12, Zoul2, AZL015]. 

A key ingredient in our algorithm is a novel combination of two techniques used in literature 
for constructing spectral sparsification: Random sampling by effective resistance [SSlf], and 
adaptive construction based on barrier functions [BSS12, AZL015]. 

keywords: algorithmic spectral graph theory, spectral sparsification 


1 Introduction 


Graph sparsification is the procedure of approximating a graph G by a sparse graph G' such that 
certain quantities between G and G' are preserved. For instance, spanners are defined between two 
graphs in which the distances between any pair of vertices in these two graphs are approximately 
the same [Che89]; cut sparsifiers are reweighted sparse graphs of the original graphs such that 
the weights of every cut between the sparsifiers and the original graphs are approximatedly the 
same [BK96]. Since both storing and processing large-scale graphs are expensive, graph sparsifica¬ 
tion is one of the most fundamental building blocks in designing fast graph algorithms, including 
solving Laplacian systems [ST04, KMPIO, KMPll, KLP12, PS14, LPS15], designing approxima¬ 
tion algorithms for the maximum flow problem [BK96, KLOS14, Shel3], and solving streaming 
problems [KL13, KLM'''14]. Beyond graph problems, techniques developed for spectral sparsifica¬ 
tion are widely used in randomized linear algebra [Mahll, LMP13, CLM+lh], sparsifying linear 
programs [LS14], and various pure mathematics problems [SS12, Sril2, MSS13, Barl4]. 

In this work, we study spectral sparsification introduced by Spielman and Teng [STll]; A 
spectral sparsifier is a reweighted sparse subgraph of the original graph such that, for all real vectors, 
the Laplacian quadratic forms between that subgraph and the original graph are approximately the 
same. Formally, for any undirected and weighted graph G = {V, E, w) with n vertices and m edges, 
we call a subgraph G' of G, with proper reweighting of the edges, is a (1 + e)—spectral sparsifier if 
it holds for any x E that 

(1 — e)x'^ Lqx ^ x^ Lqix ^ (1 -|- £)x^Lqx, 

where Lq and Lq/ are the respective graph Laplacian matrices of G and G'. 

Spielman and Teng [STll] presented the hrst algorithm for constructing spectral sparsihca- 
tion. For any undirected graph G of re vertices, their algorithm runs in 0(re log'^ re/e^) time, for 
some big constant c, and produces a spectral sparsifier with 0{n\og^ re/e^) edges for some c' ^ 2. 
Since then, there has been a wealth of work on spectral sparsification. For instance, Spielman 
and Srivastava [SSll] presented a nearly-linear time algorithm for constructing a spectral sparsi¬ 
fier of 0(relogre/e^) edges. Batson, Spielman and Srivastava [BSS12] presented an algorithm for 
constructing spectral sparsifiers with Oinje^) edges, which is optimal up to a constant. However, 
all previous constructions either require H (re^"*"^) time in order to produce linear-sized sparsi¬ 
fiers [BSS12, Zoul2, AZL015], or 0{n\og^^^^ nje^) time but the number of edges in the sparsifiers 
is sub-optimal. 

In this paper we present the first almost-linear time algorithm for constructing linear-sized 
spectral sparsification for graphs. Our result is summarized as follows: 

Theorem 1.1. Given any integer q 10 and 0 < e ^ 1/120. Let G = {V,E,w) he an undirected 
and weighted graph with re vertices and m edges. Then, there is an algorithm that outputs a (l-|-e)- 
spectral sparsifier of G with O (|5) edges. The algorithm runs in O ^ ^ ^ time. 

Graph sparsification is known as a special case of sparsifying sums of rank-1 positive semi- 
definite (PSD) matrices [BSS12, SSll], and our algorithm works in this general setting as well. Our 
result is summarized as follows: 

Theorem 1.2. Given any integer q 10 and 0 < e ^ 1/120. Let I = o/ 

m rank-1 PSD matrices. Then, there is an algorithm that outputs scalers {si}'fLi with jjsj : s, / 
0}| = O (f?) such that 

m 

(1 - e) • / ^ ^ SiVivJ ^ (1 -h e) • /. 

i=l 
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The algorithm runs in O • n‘^ i+3/g^ time, where w is the matrix-multiplication constant. 

A key ingredient in our algorithm is a novel combination of two techniques used in literature for 
constructing spectral sparsification: Random sampling by effective resistance of edges [SSll], and 
adaptive construction based on barrier functions [BSS12, AZL015]. We will present an overview 
of the algorithm, and the intuitions behind it in Section 2. 


Preliminaries Let G = {V, E, w) be a connected, undirected and weighted graph with n vertices 
and m edges, and weight function w : V x V ^ The Laplacian matrix of G is an n by n 

matrix L defined by 


Lg{u,v) 


—w{u, v) 
< deg(M) 


if u ~ u, 
if u = u. 


0 otherwise. 


where deg(u) = 'w)- R is easy to see that 


x'^LgX = '^Wu,v{Xu 

U'^V 


Xvf ^ 0 , 


for any x G M". 

For any matrix A, let Aniax(A) and Aniin(^) be the maximum and minimum eigenvalues of A. 
The condition number of matrix A is defined by Amax(^)/Amin(A). For any two matrices A and 
B, we write A ^ B to represent B — A is positive semi-definite (PSD), and A ^ B to represent 
B — Ais positive definite. For any two matrices A and B of equal dimensions, let = tr (A^B). 

For any function /, we write 0{f) = 0{f ■ log*^T) j)_ pgr matrices A and B, we write A R if 
{l-e)-A^B^ (l + e)A. 


2 Algorithm 

We study the algorithm of sparsifying the sum of rank-1 PSD matrices in this section. Our goal is 
to, for any vectors vi, - ■ ■ Vm with scalars {si}™ satisfying 

such that 

m 

(1 - e) • / ^ ^ SiVivJ :< ■ I. 

i=l 

We will use this algorithm to construct graph sparsifiers in Section 3. 

2.1 Overview of Our Approach 

Our construction is based on a probabilistic view of the algorithm presented in Batson et al. [BSS12]. 
We refer their algorithm BSS for short, and give a brief overview of the BSS algorithm at first. 

At a high level, the BSS algorithm proceeds by iterations, and adds a rank-1 matrix c • VivJ 
with some scaling factor c to the currently constructed matrix Aj in iteration j. To control the 
spectral properties of matrix Aj, the algorithm maintains two barrier values Uj and ij, and initially 
uq > 0,io < 0. It was proven that one can always find a vector in {vi}'^^ and update Uj,^j in a 
proper manner in each iteration, such that the invariant 

£jl < Aj ■< Ujl ( 2 . 1 ) 
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always holds, [BSS12]. To guarantee this, Batson et al. [BSS12] introduces a potential function 

- A)-^ + tT{A - £1)-^ (2.2) 

to measure “how far the eigenvalues of A are from the barriers u and since a small value of 
iniplies that no eigenvalue of A is close to u or £. With the help of the potential function, it 
was proven that, after k = Q (u/e^) iterations, it holds that ^ cuk for some constant c, implying 
that the resulting matrix is a linear-sized and ~o(e) 

The original BSS algorithm is deterministic, and in each iteration the algorithm finds a rank-1 
matrix which maximizes certain quantities. To informally explain our algorithm, let us look at the 
following randomized variant of the BSS algorithm: In each iteration, we choose a vector Vi with 
probability pi, and add a rank-1 matrix 

A ^ ^ ^ T 

Aa = - • — • VivI 
t Pi 

to the current matrix A. See Algorithm 1 for formal description. 

Algorithm 1 Randomized BSS algorithm 
1: j = 0; 

2: £o = —Sn/e, uq = Sn/e; 

3: Ao = 0; 

4: while Uj — £j < 8n/e do 

5: Let t = tr {ujl — Aj)~^ + tr {Aj — £jl)~^; 

6: Sample a vector Vi with probability pi = (^vj {ujl — Aj)~^ Vi + vj {Aj — £jl)~^ Vi^ /t] 

7: Aj+i = Aj + l-j:-VivJ-, 

8: Uj+i = Uj + and £j+i = £j + 

9: j ^ j + 1; 

10 : Return Aj] 


Let us look at any fixed iteration j, and analyze how the added Aa impacts the potential 
function. We drop the subscript representing the iteration j for simplicity. After adding Aa, the 
first-order approximation of gives that 

+ Aa) ~ ^uA^) + AI - • Aa - (A - £1)-^ . Aa- (2.3) 


Since 


e[Aa] =ET=lP^■ 


Pi 


ViV 




we have that 


E [d'u/(A -|- Aa)] 


~ ^uAA) + I • (nl - A)-2 £1)-^ . / 

= d>n/(A) + j-iT{uI- A)“^ - I • tr (A - £I)~‘^ 


Notice that if we increase u by | and £ by |, ^u,e approximately increases by 


- • :^^uAA) + - 


M 


^uA^)- 
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Hence, comparing ^u+e/t,i+e/t{^ + with the increase of the potential function due to 

the change of barrier values is approximately compensated by the drop of the potential function 
by the effect of A^. For a more rigorous analysis, we need to look at the higher-order terms and 
increase u slightly more than i to compensate that. Batson et ah [BSS12] gives the following 
estimate: 


Lemma 2.1 ([BSS12], proof of Lemma 3.3 and 3.4). Let A E andu,l he parameters satisfying 

II < A < ul. Suppose that w E R” satisfies ww'^ R 6{ul — A) and ww'^ R 5{A — II) for some 
0 < J < 1. Then, it holds that 


+ ww'^) ^ -k 


w'^{uI — A) ‘^w 


w'^{A — n) 

TTj 


The estimate above shows that the first-order approximation (2.3) is good if ww'^ R 5{ul — A) 
and rctcT R 5{A — il) for small <5. It is easy to check that, by setting 6 = e, the added matrix 
satisfies these two conditions, since 


e 

t 



_£ • VjvJ _ 

vj {ul - A)~^ Vi + v] {A - II)~^ Vi 




£ • Viv] 

vJ {ul - A)~^ Vi 


R e {ul — A ), 


where we used the fact that vv'^ R {v^B ^v)B for any vector v and PSD matrix B. Similarly, we 
have that 

7 • — • VivJ R e{A - il). 

Hence, if is small initially, our crude calculations above gives a good approximation and 

^u,i{l^) is small throughout the executions of the whole algorithm. Up to a constant factor, this 
gives the same result as [BSS12], and therefore Algorithm 1 constructs an 0(n/e^)-sized (l-|-0(e))- 
spectral sparsifier. 

Our algorithm follows the same framework as Algorithm 1. However, to construct a spectral 
sparsifier in almost-linear time, we expect that the sampling probability {pi}^i of vectors (i) can 
be approximately computed fast, and (ii) can be further “reused” for a few iterations. 

For fast approximation of the sampling probabilities, we adopt the idea proposed in [AZL015]: 
Instead of defining the potential function by (2.2), we define the potential function by 


$„ 7 (A) a tr{ul - A)-’i + tr(A - £/)-'?. 


Since g is a large constant, the value of the potential function becomes larger when some eigenvalue 
of A is close to u or L Hence, a bounded value of ^u/{-^) insures that the eigenvalues of A never get 
too close to u or i, which further allows us to compute the sampling probabilities {pi}iLi efficiently 
simply by Taylor expansion. Moreover, by defining the potential function based on tr(-)“'^, one can 
prove a similar result as Lemma 2.1. This gives an alternative analysis of the algorithm presented in 
[AZL015], which is the first almost-quadratic time algorithm for constructing linear-sized spectral 
sparsifiers. 

To “reuse” the sampling probabilities, we re-compute {pi}'fLi after every 0 (n^“^A) iterations: 
We show that as long as the sampling probability satishes 

vJ {ul - A)~^ Vi + vJ {A - il)~^ Vi 

ET=i [vJ + vJ (A - £1)-^ Vi) 

for some constant C > 0, we can still sample Vi with probability pi and get the same guarantee 
on the potential function. The reason is as follows: Assume that Ayi = Yli=i ‘^A,i is the sum 


4 







of the sampled matrices within T = O iterations. If a randomly chosen matrix 

satisfies A (ul — A), then by the matrix Chernoff bound ^ ^ (ul — A) holds with high 
probability. By scaling every sampled rank-1 matrix q times smaller, the sampling probability only 
changes by a constant factor within T iterations. Since we choose 0(n/e^) vectors in total, our 
algorithm only recomputes the sampling probabilities 0 times. Hence, our algorithm runs 

in almost-linear time if g is a large constant. 


2.2 Algorithm Description 

The algorithm follows the same framework as Algorithm 1, and proceeds by iterations. Initially, 
the algorithm sets 

uo ^ {2nf/\ 4 = -(271)1/^ Ao ^ 0. 

After iteration j the algorithm updates Uj,ij by A^^-, A^j respectively, i.e., 

Uj-\-l Uj T AiiJ, 4 + 1 4 

and updates Aj with respect to the chosen matrix in iteration j. The choice of A„j and Ag^j insures 
that 

4 / ^ Aj -< Ujl 

holds for any j. In iteration j, the algorithm computes the relative effective resistance of vectors 
{vi}'^i defined by 

Ri {Aj,Uj,ij) A vj {ujl - Aj)~^Vi + vj {Aj - ijI)~^Vi, 


and samples Nj vectors independently with replacement, where vector Vi is chosen with probability 
proportional to Ri{Aj,Uj,ij), and 


Nn A - 







min {Amin ('Uj/ Aj),Aniin(Aj 4/)} • 


The algorithm sets Aj+i to be the sum of Aj and sampled VivJ with proper reweighting. For 
technical reasons, we define Auj and Agj by 


Au,j A (1 -)- 2e) 


e-Ni 


9- 


Agj A (1 _ 2e) 


e-Ni 


9- 


See Algorithm 2 for formal description. 

We remark that, although exact values of Nj and relative effective resistances are difficult to 
compute in almost-linear time, we can use approximated values of Ri and Nj instead. It is easy to 
see that in each iteration an over estimate of Ri, and an under estimate of Nj with constant-factor 
approximation suffice for our purpose. 


3 Analysis 

We analyze Algorithm 2 in this section. To make the calculation less messy, we assume the following: 
Assumption 3.1. We always assume that 0 < e ^ 1/120, and q is an integer satisfying q ^ 10. 
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Algorithm 2 Algorithm for constructing spectral sparsifiers 

Require: e ^ 1/120, g ^ 10 
1: j = 0; 

2: 4 = -(2n)V'?,^to = (2n)V'?,^o = Q; 

3: while Uj — £j < A - (2n)^/'^ do 
4 : 14 - = 0; 

5; Compute Ri{Aj,Uj,ij) for all vectors u*; 

6: Sample Nj vectors independently with replacement, where every Vi is chosen with probability 

proportional to Ri{Aj,Uj,ij). For every sampled v, add e/q ■ {Ri{Aj,Uj,ij))~^ ■ vv'^ to Wj] 

7: Aj+i = Aj + Wj; 

®' 4+1 — 4 ^i+i — 

9 : j =j + 1 ; 

10; Return Aj] 


Our analysis is based on a potential function £ with barrier values G M. Formally, for 
a symmetric matrix A G with eigenvalues Ai ^ ••• ^ Xn and parameters u,i satisfying 

£1 < A < ul, let 


A>u,i{A) ^ tT{uI - A)-i + tr(A - £/)-'? 

1 / 1 


E 

2=1 


u - Xi 


+E 

2=1 


Xi-£ 


(3.1) 


We will show how the potential function evolves after each iteration in Section 3.1. Combing this 
with the ending condition of the algorithm, we will prove in Section 3.2 that the algorithm outputs 
a linear-sized spectral sparsifier. We will prove Theorem 1.1 and Theorem 1.2 in Section 3.3. 


3.1 Analysis of a Single Iteration 


We analyze the sampling scheme within a single iteration, and drop the subscript representing the 
iteration j for simplicity. Recall that in each iteration the algorithm samples N vectors indepen¬ 
dently from V = satisfying where every vector Vi is sampled with probability 


Ri{A,u,£) 
E”Ai Rj{A,u,i) ■ 
vectors by 


We use ui, • • • ,vn to denote these N sampled vectors, and define the reweighted 


Wi = 


q ■ Ri{A,u,£) 


for any 1 ^ i ^ N . Let 


N 

w = Y^WiwJ, 

2 = 1 


and we use W ~ 'D{A,u,£) to represent that W is sampled in this way with parameters A,u and 
i. We will show that with high probability matrix W satisfies 0 ^ IF ^ ~ ^)- We first recall 

the following Matrix Chernoff Bound. 


Lemma 3.2 (Matrix Chernoff Bound, [Trol2]). Let {A^} be a finite sequence of independent, 
random, and self-adjoint matrices with dimension n. Assume that each random matrix satisfies 
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Xk ^ 0, and Amax(-^A:) ^ D. Let fj, ^ Amax [^A:])- Then, it holds for any 5^0 that 

J \ 


P 


■^max E Xk 1^(1 + 


^ n 


(l + 5)i+Z 


Lemma 3.3. Assume that the number of samples satisfies 

/ ^ \ 

2 


E Ri(A,U,£)\ • Ajnin('U/ yl). 


.*=1 


Then, it holds that 


and 


E[VP] = 


N 


Q Y^7LiRi{A,u,> 


■I, 


0 <W < -■ {ul - A) 


> 1 - 


lOOgn 


Proof. By the description of the sampling procedure, it holds that 

Ery;.y;Tl ^ _ Rj{A,U,i) _£_ VfvJ^ _ ^ £_1_ 

^ ^,ET=iRtiA,u,e) q R,{A,u,i) q ET=iRtiAu,^ 


and 


E[W] = E 


N 

WiwJ 

L i=l 


N 




I, 


which proves the first statement. 

Now for the second statement. Let 


It holds that 


Zj = {ul - A) ^1‘^Wi. 

tr(zizj) = tr (^{ul — A)~^^‘^WiwJ{uI — 

e tr [{ul — A)~^^‘^VivJ{ul — A)~^/‘^^ 




Ri{A,u,i) 
vj{ul - A)~^Vi 


q 

E 

q vj{ul - A)-^Vi + vJ{A - £I)-^Vi 
£ 

~ 5 

q 


and Xmax{zizJ) ^ |. Moreover, it holds that 


E 


N 

E 

2=1 


ZfZ- 


N 


-< 


q ET=iRtiA,u,i) 

E N 

q Yu=iRt{Au,T) 


■ {ul - A)-^ 
• An 


ul — A 


■ I. 


(3.2) 
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This implies that 


Amax E 


N 

E 

i=l 


ZiZ 


e 

€ - ■ 


N 


By setting 


<1 


N 


= 




•A, 


1 


1 


ul — A 


ul- A T 


it holds by the Matrix Chernoff Bound (cf. Lemma 3.2) that 


N 


^ n ' 


Amax l^^ZjzJj ^ (1 +5)fl 
Set the value of 1 + <5 to be 

'^Rj{A,u,£) 


(1 + - 5 ) 


l+<5 


tM-q/e 


l + S = — = 

2n 2eN 

q 


ki=i 


A. 


1 


max I uI-A 


2eN 


'^Rj{A,u,i)^ ■ X^iniul - A) 
i=i 


> — • 


where the last inequality follows from the condition on N. Hence, with probability at least 


1 — n ' 


(1 + (5)^+'^ 


j ^ 1 - n 


l + <5 




> 1 — n 


1 + 5 


> 1 - 


lOOgn 


we have that 



^ (1 + 5) • /J- 


1 

2 ’ 


which implies that 0 ^ Eh 


+ 


i — 2 


I and 0 + IT + 1 • {ul — A). 


Now we analyze the change of the potential function after each iteration, and show that the 

expected value of the potential function decreases over time. By Lemma 3.3, with probability at 
2 

least 1 — it holds that 

0 + IT + ]^[uI-A). 

We define 


E[/(VL)]4 j; ] 

W^V{A,u/) 


W is chosen and W + 2 ^^^ ~ 


f{W). 


Lemma 3.4 below shows how the potential function changes after each iteration, and plays a 
key role in our analysis. This lemma was first proved in [BSS12] for the case of g = 1, and was 
extended in [AZL015] to general values of q. For completeness, we include the proof of the lemma 
in the appendix. 






















Lemma 3.4 ([AZL015]). Let q It) and e ^ 1/10. Suppose that w'^{ul — A) ^ | and 
w'^{A — £I)~^w ^ It holds that 

tr(A + ww'^ — n)~^ ^ tr(4. — n)~^ — q{l — e) w'''(A — 


and 

tr(it/ — A — ^ ti(ul — A)~'^ + q{l + e) w'^{ul — A)~^'^~^^^w. 

Lemma 3.5. Let j be any iteration. It holds that 

Proof. Let • • • , be the matrices picked in iteration j, and define for any 0 ^ z ^ 

that 

i 

Bi = Aj + ^ wtwj. 
t=i 

We study the change of the potential function after adding a rank-1 matrix within each iteration. 
For this reason, we use 

A,, = ^ = {i + 2e) ^ 


N, 


Q-TnLiRt{Aj,Uj,£j)' 


and 


A, = ^ = (1-2.). 


yj q ■ Rt{Aj,Uj,£j) 

to express the average change of the barrier values and j. We further define for 0 ^ j ^ Nj 

that 

Ui = Uj + i ■ Au, £i = ij + i ■ Ai. 

Assuming 0 A Wj A \{ujl — A j), we claim that 


Wiw] A ^ • {uil - Bi_i) and Wiw] A ^ -^i^) i 

for any 1 ^ i ^ W. Based on this, we apply Lemma 3.4 and get that 


(3.3) 


E 


4>^, (Bj_i + Wiwl) ^ 4>^, + q{l + 2e)tr [{uil - Bi_i) 

-(<?+!) 


E [ WiwJ 


- q{l - 2e)tr ( ( - ij^ 

= + g • A, • tr (^{uj - 

- g-A, •tr((B,_i-i,/)-(''+!)) . 


(3.4) 


We define a function /j by 


/i(f) = tr (({ti_i + t • A„) / - Bi_i) ''+ tr + t • A^^. 

Notice that 

^ df^ = -g • • tr {{ui-i + t • Au) I - Bi_i) + q ■ Ai ■ tr (^Bi_i - + t ■ Ai'j ij 
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Since / is convex, we have that 


d/i(t) 


dt 


> /i(l) - /i(0) = 


(3.5) 


Putting (3.4) and (3.5) together, we have that 


E 




d/i(t) 


Repeat this argument, we have that 


dt 


^ $ 


t=i 




E[^uj+uij+Mj+i)] =]E 






which proves the statement. 

So, it suffices to prove the claim (3.3). Since vv'^ ^ [v^B~^v)B for any vector v and PSD matrix 
B, we have that 

^ 4 .. 

Ri{Aj,Uj,£j) vJ{ujI-Aj) ^Vi ^ ^ 

By the assumption of Wj ^ ^{ujl — Aj), it holds that 


WiwJ = 


qitiyAj, Uj, tj j q q 


This proves the first statement of the claim. 
For the second statement, notice that 


£j+i — £j — £^e,j ^ 


eNi 


q Rt{Aj , Uj,ij) 2 


^ r) ' '^min(^j 


and hence 


WiwJ < ^ {Aj - ijl) < j (^Aj - hl^ < j - iil 


3.2 Analysis of the Approximation Guarantee 

In this subsection we will prove that the algorithm produces a linear-sized (1 -|- 0(e))-spectral 
sparsifier. We assume that the algorithm finishes after k iterations, and will prove that the output 
Afc is a (1 -|- 0(e))-spectral sparsifier. It suffices to show that the condition number of Ak is small, 
which follows directly from our setting of parameters. 

Lemma 3.6. The output matrix Aj. has condition number at most 1 -|- 0(e). 

Proof. Since the condition number of Aj. is at most 

;wfc _ A _ Uk -4 \~^ 

f-fc \ J 

it suffices to prove that (uk — £k)/uk = 0{e). 
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Since the increase rate of — Aij with respect to for any iteration j is 
An,j “ A^j- _ (1 + 26:) — (1 — 2£) 4e 

A -I 1 rA 


A, 


1 + 2e 


1 + 2e 




we have that 


Uk — ' (2n)^/'^ + Ylj=o ^i,j) 

Uk (2n)V9 + 




,j=0 

2.(2n)V« + i:;:‘(A„,,-A^) 


(2n)i/, + (4E)-ij;‘z;(A„j-A,,_j)' 

By the ending condition of the algorithm, it holds that Uk — (-k ^ ' (2n)^/'^, i.e. 


fc-i 

j=0 


Hence, it holds that 


Uk-h ^ 2-(2n)V‘? + 2-(2n)Vg ^ 

Uk ^ (2n)V<? + (4e)"^2-(2n)V<? ^ 


which finishes the proof. 

Now we prove that the algorithm finishes in O ^^ iterations, and picks O (|5) vectors 
total. 

Lemma 3.7. The following statements hold: 

• With probability at least 4/5, the algorithm finishes in iterations. 

• With probability at least 4/5, the algorithm chooses at most vectors. 

Proof. Notice that after iteration j the barrier gap uj — ij is increased by 


m 




4e2 Nj 

4e^ 1 

• min {Amin(Uj,/ ^i)) ^min(^j 


2/g 


-1/q 


q n 

Since the algorithm finishes within k iterations if 

fc-i 

^(A„,, - A^,,) ^ 2 • {2ny/fi 

j=0 
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it holds that 


’ [ algorithm finishes within k iterations ] ^ 


fc-i 




3=0 






“o?" 


fc-l 






3=0 

k-1 


3=0 H \ / 


1/9 


where the last inequality follows from the fact that 

^k-l 


'k-l 

J=o 


-1/9 


,3=0 


1/9 


>k^. 


By Lemma 3.3, every picked matrix Wj in iteration j satisfies 

O^Wj {ujl - A) 

with probability at least 1 — and with probability 9/10 all matrices picked in k = 

iterations satisfy the condition above. Also, by Lemma 3.5 we have that 


E 


3=0 


k-l 


k-l 


^E [(c1>,^,,,^,(A,))V9] ^ ^ (e ^ k, (3.6) 


3=0 j=0 

since the initial value of the potential function is at most 1. Therefore, it holds that 


P [ algorithm finishes in more than k iterations 

'.2.-2 / I \ 1/9 

q \2n^) 




E(-i>...L'4j))‘'’>2.A!./EV 




3=0 

k-l 


^ (Aj)) ^ 2 • ^ • i^—j and Vj : Wj ^ -(ujl - Aj) 


3=0 




+ ] 
q 


2 ■ ke^ 


^j-.Wj^-{u,I-Aj) 

■ + 1/10 ^ 1/5, 


where the second last inequity follows from Markov’s inequality and (3.6), and the last inequality 
follows by our choice of k. This proves the first statement. 
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Now for the second statement. Notice that for every vector chosen in iteration j, the barrier 
gap /S.u,j — ^t,j is increased on average by 

4e^ 


^u,j 

Ni 


Q i^j ) ) 


To bound Ri{Aj,Uj,£j), let the eigenvalues of matrix be Ai, • • • , A„,. Then, it holds that 

m 


^^Ri{Aj,Uj,ij) — '^^vj{ujl Aj) Vi + '^^vJ{Aj ijl) i 


2=1 


2=1 


2 = 1 


n ^ n - 

J2u-X- 

Ujj /\i . /\i To 

2 = 1 2=1 


Vi=l i=l 

= ,(,(.4,))'''"-(an)*-'/’ 


I 


\ 1/9 


Therefore, we have that 




N, 


q (2n)i-V9.(c^„^^_,^,(yl^.))i/9' 


(3.7) 


Let ui, • • • ,Vz be the vectors sampled by the algorithm, and vj is picked in iteration tj, where 
1 ^ j ^ 2 . We first assume that the algorithm could check the ending condition after adding every 
single vector. In such case, it holds that 


P [ algorithm finishes after choosing z vectors 

p4e2 1 

^ q ■(2n)i-V9.($ , (^,^))i/9 




^ 2 • (2n)i/9 




i=i 

Following the same proof as the first part and noticing that in the final iteration the algorithm 
chooses at most 0{n) extra vectors, we obtain the second statement. ■ 


3.3 Proof of the Main Results 

Now we analyze the runtime of the algorithm, and prove the main results. We first analyze the 
algorithm for sparsifying sums of rank-1 PSD matrices, and prove Theorem 1.2. 

Proof of Theorem 1.2. By Lemma 3.7, with probability at least 4/5 the algorithm chooses at most 
vectors, and by Lemma 3.6 the condition number of Ak is at most 1 -|- 0(e), implying that 
the matrix is a (1 -|- 0(e))-approximation of I. These two results together prove that Ak is a 
linear-sized spectral sparsifier. 

For the runtime. Lemma 3.7 proves that the algorithm finishes in iterations, and it is 

easy to see that all the required quantities in each iteration can be approximately computed in 
0(m • time using fast matrix multiplication. Therefore, the total runtime of the algorithm is 

0 (^ .^-^- 1 + 3 / 9 ), ■ 
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Next we show how to apply our algorithm in the graph setting, and prove Theorem 1.1. Let 
L = Uiul be the Laplacian matrix of an undirected graph G, where UiuJ is the Laplacian 

matrix of the graph consisting of a single edge e^. By setting 

Vi = 


for 1 ^ ^ m, it is easy to see that constructing a spectral sparsifier of G is equivalent to sparsifing 

the matrix YllLi ■ We will present in the appendix almost-linear time algorithms to approximate 
the required quantities 

-^min ’ -^min i '^i (.^j^ and i-^j 


in each iteration, and this gives Theorem 1.1. 


Proof of Theorem 1.1. By applying the same analysis as in the proof of Theorem 1.2, we know that 
the output matrix is a linear-sized spectral sparsifier, and it suffices to analyze the runtime of 
the algorithm. 

By Lemma 3.3 and the Union Bound, with probability at least 9/10 all the matrices picked in 
k = iterations satisfy 

Conditioning on the event, with constant probability E ^ 2 for all iterations j, and 

by Markov’s inequality with high probability it holds that = O for all iterations j. 

On the other hand, notice that it holds for any 1 ^ j ^ n that 

n 

{u - A,)-^ ^ ^(n - A,)-" < 

i=l 

which implies that Xj < u — • Similarly, it holds that Xj > ^ + 

1 ^ 3 ^ n. Therefore, we have that 


+ 0 



I <Aj< 



I. 


Since both of uj and Ij are of the order 0(n^/'^), we set rj 


O ((e/n)^/^) and obtain that 


{ij + -< Aj -< (1 - ri)ujl. 


Hence, we apply Lemma 4.5 and Lemma 4.6 to compute all required quantities in each iteration 
up to constant approximation in time 


O 


m 
■ r] 


O 


( m ■ 

e2-H2/<? 


Since by Lemma 3.7 the algorithm finishes in iterations with probability at least 4/5, the 

total runtime of the algorithm is 

A/q\ 


O 


q ■ m ■ n" 
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4 Omitted Proofs 


4.1 Estimates of the Potential Functions 


In this subsection we prove Lemma 3.4. We first list the following two lemmas, which will be used 
in our proof. 


Lemma 4.1 (Sherman-Morrison Formula). Let A € be an invertible matrix, and u, u € M”. 

Suppose that 1 + A~^u / 0. Then it holds that 


{A + uv'^) ^ 


A ^uv'^A ^ 

1 + v'^A~^u ' 


Lemma 4.2 (Lieb Thirring Inequality, [LT76]). Let A and B be positive definite matrices, and 
q 1. Then it holds that 

tviBABy ^ tr(.BM'?S'^). 


Proof of Lemma 3.4- Let Y = A — il. By the Sherman-Morrison Formula (Lemma 4.1), it holds 
that 

tr(y + w»T)-. = t,(y-.-__^) , (4,1) 

By the assumption of w^Y~^w ^ we have that 


tr(F -|- ww'^) ^ tr I y ^ — 




= tr y-^/2 J_ 


l+e/q 

y-^/ 2 y^u;Ty-i/ 2 ' 


^ tr y-9/2 J _ 


= tr y-M I - 


l + e/q 

y-l/2y^y^Ty-l/2\ 

1 + e/g ) 

y-l/2y,y;Ty-l/2y^ 

l + e/9 / , 


y-l/2 


y- 9/2 


(4.2) 


(4.3) 

(4.4) 


where (4.2) uses the fact that A Y B implies that tr(^'^) ^ tr(i?'^), (4.3) follows from the Lieb- 
Thirring inequality (Lemma 4.2), and (4.4) uses the fact that the trace is invariant under cyclic 
permutations. 

Let 


y-i/2y;^i;Ty-i/2 

l + e/q 


Note that 0 + D Y ^ ■ I, and 


{I-Oy Y I-qD + ^ 

+ I-(q-^-^)D 
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Therefore, we have that 


I - 


y-i/2y;^^Ty-i/2' 

l + e/q 




£{q — 1)\ Y ^/'^ww'^Y 


- 


l + e/q 

(i - £) y-vWF-‘/= 

5 ti _ r-‘/Wr-i/2 


<I -q{l-£) Y-^/^ww'^Y-^/^. 

This implies that 

tr(F + ww'^)~^ ^ tr (y~'^ (j — q{l — £)Y~^/'^ww'^Y~^/'^y^ ^ tr (T“'^) — q{l — e) w^Y~^'^~^^\ 
which proves the first statement. 

Now for the second inequality. Let Z = uI—A. By the Sherman-Morrison Formula (Lemma 4.1), 
it holds that 




I ry T\-0 / ry-\ ^ ^WW^Z ^ 

ti(Z — ww^) ^ = tr Z ^ H--—j— ' 

1 — VJT Z~^w 


By the assumption of Z ^ it holds that 


tr(Z — wvY) ^ tr I Z ^ + 


Z-^ww^Z-^\‘^ 
l-e/q 


(4.5) 


(4.6) 




^ tr Z-^/^ I + 




= tr Z-'' / + 


l-e/q j 

y-l/2y;y;Ty-l/2y' 

l-e/q ] 


Z-q/2 


(4.7) 


(4.8) 


where (4.6) uses the fact that A < B implies that tr(^'^) ^ tr(i?'^), (4.7) follows from the Lieb- 
Thirring inequality (Lemma 4.2), and (4.8) uses the fact that the trace is invariant under cyclic 
permutations. 

Let 

E = Z-^/^ww^Z-^/^. 

Combing E Y ^ ■ I with the assumption that g ^ 10 and e ^ 1/10, we have that 


1 + 


E 


l-e/q 




qE ^ q{q-l) 


l-e/q 


1 + 


e/q 

l-e/q 

+ I + q{l + l.y ^ E + 1.4^^y^F;2 


q-2 


E 


l-e/q 


+ I + q{l + 0.3e) E + OAeqE 
+ I + q{l + e)E. 

Therefore, we have that 

tr(Z — ww^)~‘^ ^ tr {Z~'^) + q{l + e) w'^ 
which proves the second statement. 
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4.2 Implementation of the Algorithm 

In this section, we show that the algorithm for constructing graph sparsification runs in almost- 
linear time. Based on previous discussion, we only need to prove that, for any iteration j, the 
number of samples Nj and can be approximately computed in almost-linear 

time. By definition, it suffices to compute Amin {ujl — Aj), Amin {Aj — ^jl), vj [ujl — Aj)~^ Uj, and 
vj {Aj — Vi for all i. For simplicity we drop the subscript j expressing the iterations in 

this subsection. We will assume that the following assumption holds on A. We remark that an 
almost-linear time algorithm for computing similar quantities was shown in [AZL015]. 

Assumption 4.3. Let L and L be the Laplacian matrices of graph G and its subgraph after reweight¬ 
ing. Let A = and assume that 

{£ \£\r]) ■ L ^ A ^ {I — r])u ■ I 


holds for some 0 < rj < 1. 

Lemma 4.4. Under Assumption f.3, the following statements hold: 

• We can construct a matrix Su such that 

Su ^e/io (n/-A)-V2, 

and Su = piA) for a polynomial p of degree O _ 

• We can construct a matrix Si such that 

Si-e/io {A-n)-^/^ 

Moreover, Si is of the form {A')~^/‘^q{{A')~^),where q is a polynomial of degree O 
and A' = L-^/^L'L-^/‘^ for some Laplacian matrix L'. 

Proof. By Taylor expansion, it holds that 

oo k—1 




k=lj=0 


We define for any T G N that 


T fc-l 

Pt{x) = 1 + X^n (■^’ + 
k=lj=0 


lA 

2 Id' 


Then, it holds for any 0 < x < 1 — ?? that 


- ^ 1 - 1/2 = 


OO fc —1 


Prix) ^ (1 - x) 


pt{x)+ n (-^’+ 


fe=r+i j=o 

oo 


X^ 


2 k\ 


^ Pt{x) Y 
fc=r+i 

(1 - 


^ Pt{x) + 


V 
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Hence, it holds that 


{ul — A) — u ^A) ^ tt ^A), 

and 

(^/ _ A)-V2 ^ ^-1/2 • /) , 

since ^ (1 — r/)I. Notice that ^ (ul — and therefore 

{ul - H)-V2 ^ u-^/‘^pt{u-^A) + • («I - H)-V2. 

p 

Setting T = for some constant c and defining = u~^^'^pt{u~^A) gives us that 

Su ^e/lO {uI-A)-^/\ 

Now for the second statement. Our construction of Si is based on the case distinction (I > 0, 
and ^ 0). 

Case (1): i Notice that 

{A - = A-^/^{I - iA-^)-^/^, 

and 

PT{iA-^) <{I- ^ PT (M-1) + • I- 

Using the same analysis as before, we have that 

yl-i/2(j _ -e/io A-^/^pT{iAY- 

By defining Si = A~^^‘^pT{iA~^), i.e.. A' = A and q ((H')“^) = pt(£A~^), we have that 

Si-e/io {A-nYi\ 

Case (2): I ^ 0. We look at the matrix 

A-ii = -ei = - iL)L-^/‘^. 

Notice that L — iL is a Laplacian matrix, and hence this reduces to the case of = 0, for which we 
simply set Si = {A — Therefore, we can write Si as a desired form, where A' = A — il and 

polynomial q = 1- ■ 

Lemma 4.5 below shows how to estimate vj{ul — A)~^Vi, and vJ(A — iI)~^Vi, for all Vi in 
nearly-linear time. 

Lemma 4.5. Let A = YY=i suppose that A satisfies Assumption ^.3. Then, we can 

compute {ri}Y in O 

(1 - £)ri ^ vj{ul - AYvi ^ (1 + £)ri, 


and 

(1 - e)ti ^ vJ{A - ilYvi ^ (1 + £)ti. 
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Proof. Define Ui = for any 1 ^ i ^ m. By Lemma 4.4, we have that 


vJ{uI-A) Vi \\p{A)i 


p (l-V2Zl-V2) L-^/\i 


L^/^p (L-^l] L 


Let L = B^B for some B G 


Then, it holds that 


- 1 , 


Vj{ul-A) ^Vi ^3^/10 


Bp[L-^L) L-^Ui 


-i„ 


We invoke the Johnson-Lindenstrauss Lemma and find a random matrix Q G i^O(iog>T-/e^)xm-: With 
high probability, it holds that 


v]{ul - A) «4£/10 


QBp[L-^L] L-^Ui 




We apply a nearly-linear time Laplacian solver to compute 


QBp[L-^L] L-^u. 


for all {ui}ifi 


up to (1 ± e/10)-multiplicative error in time O ( ^ ). This gives the desired 


The computation for {L}™ ^ is similar. By Lemma 4.4, it holds for any 1 ^ i ^ m that 
vj{A - £I)-^Vi {A')-^/'^q{{A')-^)vi 

{A'r^^^c 


')-'lV2) l-^/\ 


Let L' = {B'y{B') for some B' G Then, it holds that 

vj{A - n)-^Vi « 3 £/io {L')~^^‘^q {L{L')-^) u, 

iL'y/yLT\{LiL')-^)u, 


= \\B'{Lr\{L{L')-^) 


Ui 


We invoke the Johnson-Lindenstrauss Lemma and a nearly-linear time Laplacian solver as before 
to obtain required The total runtime is O 




Lemma 4.6 shows that how to approximate X^i^ul — A) and X^i^A — II) in nearly-linear time. 


Lemma 4.6. Under Assumption 4-3, we ean compute values a, 13 in O 

(1 - e)a ^ X^i^ul - ^ (1 -b e)a 

and 

(1 — e)f3 ^ Amin(^ ~ £1) ^ (1 + £)f3. 


Tje- 


time such that 
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Proof. By Lemma 4.4, we have that Hence, Aniax(*S'ii) X^iyi{ul— A), 

and it suffices to estimate Aniax('S'u). Since 

Ama.(5„) ^ (tr (5f ^ 

by picking k = logn/e we have that (tr(S'2^))^'^^^ ^e /2 ^■aiay.i.Su). Notice that 
tr (Sf) = tr = tr (^■'^)) • 

Set L = B'^B for some matrix B G and we have that tr (S2*^) = tr ^p2fc (^BL~^B'^^^. 

Since we can apply ^BL~^B'^^ to vectors in O time, we invoke the Johnson-Lindenstrauss 

Lemma and approximate tr ( 52 ^) in O time. 

We approximate Amin(H — il) in a similar way. Notice that 

tr(sf)=tr((A')-'-'"5((A')-‘))" 

= tr(5((.4')-‘)(.4')-‘9((.4')-‘))“. 

Let 2 ; be a polynomial defined by z(x) = xq^(x) and L' = {B'y{B'). Then, we have that 
tr(Sf) = tr = tr (^lH2(l')-1lH2^^ . 

Applying the same analysis as before, we can estimate the trace in O time. ■ 
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